A Linear-Time Burrows-Wheeler Transform Using Induced Sorting

نویسندگان

  • Daisuke Okanohara
  • Kunihiko Sadakane
چکیده

To compute Burrows-Wheeler Transform (BWT), one usually builds a suffix array (SA) first, and then obtains BWT using SA, which requires much redundant working space. In previous studies to compute BWT directly [6, 13], one constructs BWT incrementally, which requires O(n logn) time where n is the length of the input text. We present an algorithm for computing BWT directly in linear time by modifying the suffix array construction algorithm based on induced sorting [16]. We show that the working space is O(n log σ log logσ n) for any σ where σ is the alphabet size, which is the smallest among the known linear time algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Bijective String Sorting Transform

Given a string of characters, the Burrows-Wheeler Transform rearranges the characters in it so as to produce another string of the same length which is more amenable to compression techniques such as move to front, run-length encoding, and entropy encoders. We present a variant of the transform which gives rise to similar or better compression value, but, unlike the original, the transform we p...

متن کامل

RadixZip: Linear-Time Compression of Token Streams

RadixZip is a block compression technique for token streams. It introduces RadixZip Transform, a linear time algorithm that rearranges bytes using a technique inspired by radix sorting. For appropriate data, RadixZip Transform is analogous to the Burrows-Wheeler Transform used in bzip2, but is both simpler in operation and more effective in compression. In addition, RadixZip Transform can take ...

متن کامل

A Text Transformation Scheme for Degenerate Strings

The Burrows-Wheeler Transformation computes a permutation of a string of letters over an alphabet, and is well-suited to compression-related applications due to its invertability and data clustering properties. For space e ciency the input to the transform can be preprocessed into Lyndon factors. We consider scenarios with uncertainty regarding the data: a position in an indeterminate or degene...

متن کامل

Output distribution of the Burrows - Wheeler transform ' Karthik

The Burrows-Wheeler transform is a block-sorting algorithm which has been shown empirically to be useful in compressing text data. In this paper we study the output distribution of the transform for i.i.d. sources, tree sources and stationary ergodic sources. We can also give analytic bounds on the performance of some universal compression schemes which use the Burrows-Wheeler transform.

متن کامل

Two Space Saving Tricks for Linear Time LCP Array Computation

In this paper we consider the linear time algorithm of Kasai et al. [6] for the computation of the Longest Common Prefix (LCP) array given the text and the suffix array. We show that this algorithm can be implemented without any auxiliary array in addition to the ones required for the input (the text and the suffix array) and the output (the LCP array). Thus, for a text of length n, we reduce t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009